Learning a Dictionary of Shape-Components in Visual Cortex: Comparison with Neurons, Humans and Machines
نویسندگان
چکیده
In this thesis, I describe a quantitative model that accounts for the circuits and computations of the feedforward path of the ventral stream of visual cortex. This model is consistent with a general theory of visual processing that extends the hierarchical model of [Hubel and Wiesel, 1959] from primary to extrastriate visual areas. It attempts to explain the first few hundredmilliseconds of visual processing and “immediate recognition”. One of the key elements in the approach is the learning of a generic dictionary of shapecomponents from V2 to IT, which provides an invariant representation to task-specific categorization circuits in higher brain areas. This vocabulary of shape-tuned units is learned in an unsupervised manner from natural images, and constitutes a large and redundant set of image features with different complexities and invariances. This theory significantly extends an earlier approach by [Riesenhuber and Poggio, 1999a] and builds upon several existing neurobiological models and conceptual proposals. First, I present evidence to show that the model can duplicate the tuning properties of neurons in various brain areas (e.g., V1, V4 and IT). In particular, the model agrees with data from V4 about the response of neurons to combinations of simple two-bar stimuli [Reynolds et al., 1999] (within the receptive field of the S2 units) and some of the C2 units in the model show a tuning for boundary conformations which is consistent with recordings from V4 [Pasupathy and Connor, 2001]. Second, I show that not only can the model duplicate the tuning properties of neurons in various brain areas when probed with artificial stimuli, but it can also handle the recognition of objects in the real-world, to the extent of competing with the best computer vision systems. Third, I describe a comparison between the performance of the model and the performance of human observers in a rapid animal vs. non-animal recognition task for which recognition is fast and cortical back-projections are likely to be inactive. Results indicate that the model predicts human performance extremely well when the delay between the stimulus and the mask is about 50 ms. This suggests that cortical back-projections may not play a significant role when the time interval is in this range, and the model may therefore provide a satisfactory description of the feedforward path. Taken together, the evidences suggest that we may have the skeleton of a successful theory of visual cortex. In addition, this may be the first time that a neurobiological model, faithful to the physiology and the anatomy of visual cortex, not only competes with some of the best computer vision systems thus providing a realistic alternative to engineered artificial vision systems, but also achieves performance close to that of humans in a categorization task involving complex natural images. Thesis Supervisor: Tomaso Poggio Title: Eugene McDermott Professor in the Brain Sciences and Human Behavior Acknowledgments I acknowledge my advisor, Prof. Tomaso Poggio (McGovern Institute, MIT), for his mentoring during this thesis and for helping me transition from computer science to neuroscience. In particular, I would like to thank him for sharing his scientific thinking, shaping my own critical judgment and teaching me to always sort out important from irrelevant research results. I acknowledge the members of my thesis committee Jim DiCarlo (McGovern Institute, MIT), Earl Miller (Picower Institute, MIT) and Simon Thorpe (CNRS, France) for their patience, guidance and advice. I acknowledge my collaborators during the five years of this thesis work: First and foremost Tomaso Poggio (MIT) for his substantial contributions on all aspects of this work; Maximilian Riesenhuber (Georgetown) and Jennifer Louie (MIT) for their contributions in the early stages of this thesis; Charles Cadieu (Berkeley) and Minjoon Kouh for their contributions on the comparison between the tuning of model units and V4 data in Chapter 3; Stan Bileschi (MIT) and Lior Wolf (MIT) for their contributions in the comparison between the model and the computer vision systems in Chapter 4; Aude Oliva (MIT) for her contribution on the comparison between the model and human observers in Chapter 5; Rodrigo Sigala (Max Planck Institute) and Martin Giese (Tubingen) for their contributions in extending the present model of the ventral stream to the recognition of biological motion in the dorsal stream in Chapter 6; Dirk Walther (CalTech) and Christof Koch (CalTech) for their contributions in bringing together the present model of “pre-attentive” vision with a model of top-down attention in Chapter 6; Christof Koch (CalTech) and Gabriel Kreiman (MIT) for their contributions in proposing possible roles for the back-projections and in the model of mental imagery in Chapter 6. I would like to thank my neuroscience fellow at CBCL, Charles Cadieu (Berkeley), Ulf Knoblich (MIT), Minjoon Kouh (MIT) and Gabriel Kreiman (MIT) for numerous discussions and various contributions in the development of the model. I would also like to thank several people for valuable discussions related to this work: Heinrich Bülthoff, Chou Hung, David Lowe, Pietro Perona, Antonio Torralba and Davide Zoccolan. Many thanks to Yuri Ivanov now a friend after a summer internship under his supervision in the summer 2003 at Honda Research Inc. and Sanmay Das, my office mate and friend, for making my last 3 years at CBCL enjoyable.
منابع مشابه
Action of brain-derived neurotrophic factor on function and morphology of visual cortical neurons
Brain-derived neurotrophic factor (BDNF) is known to play a role in experience-dependent plasticity of the developing visual cortex. For example, BDNF acutely enhances long-term potentiation and blocks long-term depression in the visual cortex of young rats. Such acute actions of BDNF suggested to be mediated mainly through presynaptic mechanisms. A chronic application of BDNF to the visual cor...
متن کاملAction of brain-derived neurotrophic factor on function and morphology of visual cortical neurons
Brain-derived neurotrophic factor (BDNF) is known to play a role in experience-dependent plasticity of the developing visual cortex. For example, BDNF acutely enhances long-term potentiation and blocks long-term depression in the visual cortex of young rats. Such acute actions of BDNF suggested to be mediated mainly through presynaptic mechanisms. A chronic application of BDNF to the visual cor...
متن کاملSpeech Enhancement using Adaptive Data-Based Dictionary Learning
In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques ...
متن کامل(S)- 3,5-Dihydroxyphenylglycine )an agonist for group I metabotropic glutamate receptors( induced synaptic potentiation at excitatory synapses on fast spiking GABAergic cells in visual cortex
Introduction: (S)- 3,5-Dihydroxyphenylglycine (DHPG) is an agonist for group I metabotropic glutamate receptors. DHPG-induced synaptic depression of excitatory synapses on hippocampal pyramidal neurons is well known model for synaptic plasticity studies. The aim of the present study was to examine the effects of DHPG superfusion on excitatory synapses on pyramidal and fast-spiking GABAergic cel...
متن کاملStable Rough Extreme Learning Machines for the Identification of Uncertain Continuous-Time Nonlinear Systems
Rough extreme learning machines (RELMs) are rough-neural networks with one hidden layer where the parameters between the inputs and hidden neurons are arbitrarily chosen and never updated. In this paper, we propose RELMs with a stable online learning algorithm for the identification of continuous-time nonlinear systems in the presence of noises and uncertainties, and we prove the global ...
متن کامل